首页> 外文OA文献 >Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

【2h】

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

机译：分区算法提高主题建模效率并行

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Topic modeling is a very powerful technique in data analysis and data miningbut it is generally slow. Many parallelization approaches have been proposed tospeed up the learning process. However, they are usually not very efficientbecause of the many kinds of overhead, especially the load-balancing problem.We address this problem by proposing three partitioning algorithms, whicheither run more quickly or achieve better load balance than currentpartitioning algorithms. These algorithms can easily be extended to improveparallelization efficiency on other topic models similar to LDA, e.g., Bag ofTimestamps, which is an extension of LDA with time information. We evaluatethese algorithms on two popular datasets, NIPS and NYTimes. We also build adataset containing over 1,000,000 scientific publications in the computerscience domain from 1951 to 2010 to experiment with Bag of Timestampsparallelization, which we design to demonstrate the proposed algorithms'extensibility. The results strongly confirm the advantages of these algorithms.

机译：主题建模是数据分析和数据挖掘中非常强大的技术，但通常速度较慢。已经提出了许多并行化方法来加速学习过程。但是，由于存在许多额外开销，尤其是负载平衡问题，它们通常不是很有效。我们通过提出三种分区算法来解决此问题，它们比当前分区算法运行更快或实现了更好的负载平衡。可以轻松扩展这些算法以提高类似于LDA的其他主题模型（例如Bag of Timestamps）的并行化效率，这是LDA的时间信息扩展。我们在两个流行的数据集NIPS和NYTimes上评估了这些算法。我们还建立了一个数据集，该数据集包含1951年至2010年计算机科学领域中超过1,000,000篇科学出版物，以试验“时间标记”并行化，我们设计该数据集来证明所提出算法的可扩展性。结果强烈证实了这些算法的优点。

著录项

作者
Tran, Hung Nghiep; Takasu, Atsuhiro;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. A partitioned shift-without-invert algorithm to improve parallel eigensolution efficiency in real-space electronic transport [J] . Feldman Baruch, Zhou Yunkai Computer physics communications . 2016,第Null期

机译：一种分块无移位平移算法，以提高实际空间电子运输中的并行本征求解效率
2. A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design [J] . Hou Neng, Yan Xiaohu, He Fazhi Design automation for embedded systems . 2019,第1a2期

机译：用于硬件/软件共同设计的分区模型，解决方案算法和算法的调查
3. ALGORITHMS FOR PARALLELIZING A MATHEMATICAL MODEL OF FOREST FIRES ON SUPERCOMPUTERS AND THEORETICAL ESTIMATES FOR THE EFFICIENCY OF PARALLEL PROGRAMS [J] . N. V. Baranovskiy Cybernetics and Systems Analysis . 2015,第3期

机译：超级计算机上森林火灾数学模型的并行算法和并行程序效率的理论估计
4. Partitioning algorithms for improving efficiency of topic modeling parallelization [C] . Hung Nghiep Tran, Takasu Atsuhiro IEEE Pacific Rim Conference on Communications, Computers and Signal Processing . 2015

机译：用于提高主题建模并行化效率的分区算法
5. Using Statistical Analysis to Improve Data Partitioning in Algorithms for Data Parallel Processing Implementation [D] . Hidalgo Murillo, Manuel E. 2016

机译：在数据并行处理实现算法中，使用统计分析来改善数据划分
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. Improvement of the Efficiency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-Level Framework [O] . Cédric Chevalier, François Pellegrini 2006

机译：多级框架下可扩展并行图分区遗传算法效率的提高
8. Improved spectral graph partitioning algorithm for mapping parallel computations [R] . Hendrickson, B, Leland, R 1992

机译：用于映射并行计算的改进的谱图分区算法

Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization

摘要

著录项

相似文献

相关主题

期刊订阅